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FOREWORD. _ 


In Fecognition of the importance of questions of validity to the Graduate 
Record Examinations program, the GRE Board asked Dr. Warren W. Will | 


” _"fnghiam of the staff of Educational Testing Service to prepare a paper onthe , 


subject of validity and the GRE that could provide a hpsis for further Board, 
discussion and decisions, After reviewing Dr. Willingham’s paper, members 
of the Board agreed that it was an excellent'document, which might well be 
‘of intereat to others concerned with graduate admissions and the transition 
, from undergraduate to graduate study. Accordingly, the GRE Board asked 
that thie paper bn ld, and we are vent to make it available. 


"Richard H. “armitage * 
. Chat, “ Boaril, 
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of examinations spondored 
- 99 attention. It js suggested i in 
this paper, however, that current favues.concerning.their validity are critical 
; to the immediate future of graduate‘admissions. Therefore, it is also sug: 
_ gested that researel} on validity should have high priority in thé GRE pro- ~ 

. gaa over the next several years. - 
The main purposs of this paper is£6 ficHitate digcusston ‘of important 
" iasnes concerning validity and to-work’toward a framework:that tha GRE . 
Board Research Committee will find useful in assigning priorities and 
initiating projects. Toward that end, subsequent sections of the paper, pro- 
vide background, défine the scopé of the problem, and outline six major 
objectives that might guide the, Board! efforts in this important afea 7 

research. 


Ls - 7 
‘sanclpeine OF; THE. PROBLEM. sa 


‘To syprecate fe varitia/tepecte'of validity that apply to the, GRE pro: 
gram, i¢ is udeful to consider the functions and ‘role of the program, the 
kinds.of research on validity undertaken to date, and why research on 
Pe a inbnarre ai dial this point inthe life of ho program. 
, ; 
° 


Nature of thet GRE program 


The ineanog! obs Attaches to walty, fas regards the GRE program, , 
depends on how one perceives the role of the program. The GRE program is 

. - “fdlentified primarily as a series ofexaminations for-use by graduate schools | 
ee * in lcting te padent. The progra® has, however, a variety of func-: 
tions qnd a variety of constituents. The following functions can be identified: 


© tohelp with the admission of studentsto graduate school through publi- 
cations, research, advisory services, and forum activities; ~ 
~ _ @ toprovide examinatjon programs as measures of atudenté’ potential for 
» pr success in graduaté education; 
* toprovidean objective national basis for understanding the nature ond 
» distribution of academic talent by analyzing and describing chéracter- 
istics of relevant student groups; 
© to facilitate educational and career guidance by providing information 
fo students and faculty; - 
* to inform: umdergraduatesinstitutions what the graduate comailty* 
, considers bare nae Preparation for graduate study. é 


“The followittg ‘oittituenctes can be distinguished: 


‘ -© studehts of different Ago, sex, background, arid so forth; 
administrators and their institutions; 
© different academic disciplines and fields; = 
“pastes as well as doctoral Programs. ae 


There in considerable Variation in the extent to which the Gke program 

‘« serves thesd functions and constituencies, and there is, no doubt, consider- 
“able tliffefence of opinion regarding their priority, Some have a recognized 
. andexplicit role in the program; the role of others may-be largely implicit, 

~* Forgxample, asthe program has operated thus far, the roles of the last two 
functions listed above are more accurately characterized as potentially 
important than as of primary concem. Also, it seems likely that students 
gomprisea more important constituency of the program in their own minds 

, than in the of many institutional sponsors. Nonetheless, each of these 

_ fahctions and yonstituencies helps-to define the tesponsibilities of the, 


in diocasing the piogram's functions, it i» mtesetiy to ditingulah 

between wehe GRE program” and the "examination program.” The "GRE 
“program” denotes the entire ‘program structure—ita organization, gover- 
nance, staffing, financing, research activities, major operational (COTIpO- | 
nents, and so on; The “azamingtion program* has 8 narrower meaning: the 
test or group of tests that a gandidate may take or an institution require and 
the directly related services such as directions to examinges and information 
for interpreting their scores, guidelines for institutional use of the tests, 
analytic reports, ¢ ‘and ‘so forth. Unless stated Senewingy eee is used 
~ hereafter to Getiote this narrower * meaning. a Fs 


‘Previous research 

Research directly related to the validity of the examination program has 
fallen into one of two partially overlapping categories: {A) a variety of re- 
- search carried oft at ETS under GRE sponsorship and ({B) validity studies 
carried out at other institutions. The overview which follows indicates the 
* general types of GRE research that have a bearing on validity and the main 
‘conclusions concerning predictive validity that seem warranted on the basis 
of institutional studies. For reviews of methodological or other igaves, nee 
the reparts cited below. 

‘The relevant GRE research over the past 10 to 15 years has been partly 
concerned with traditional validity studies, but also it has treated a variety 
of.other topics bearing upon velidity. Part A of the attached bibliography 
* lists 25 GRE publications that report eo research, They concern the 
; following topics: “ 


° methodological issues in the conduct of validity studies (Boldt, 1976; 
Reilly and Jackson, 1974; Rock, 1974, 1975) 
® studies of the validity of the GRE. in the salection of foreign sradaty, 
. students (Harvey and Pitcher, 1963; Sharon, 1974) «>. * 
© test bias and'the use of the GRE png ie plant tr 
graduate study (Echternacht, 1974; Flaugher, 1974) 
© criterion problems and the analysis of what constitutes Succede in grad. 
uate study (Campbell, Freund, and Lannholm, 1965; Carlson, Evans, 
+ and Kuykendall, 1974; Reilly, 1974a, 1974b) 
© studies concerning test use in selective admissions (Burns, 1970; Camp 
” bell, Hilton, and Pitcher, 1967; Burns, Dremuk, and others, 1971; 
“Lannholm, 1962, 1968a; Madaus, 1966) 


© special prediction studies and summaries ofinstitptional validity aradios : 


{Lannholm, 1960, 1968b, 1972; Lannholm, Marco, and Schrader, 1968; 
” Lantholsh and Schrader, 1951; Olsen, 1955; Rock, 1972) 


‘The institutional studies report on statistical analyses of the relationship 


| between GRE tests and other prédictora;to various criteria of success in- 


sere end england sespnnedar ayy PartBof | 
“the. attached bibliography lists the 43 studies rgported between 1952 and 
1972 that Willingham analyzed in that article, etudies were baséd on 
: 198 independent cets of data and 616 validity coefficients, The data indi. | 
ace B 
. © Validity coetficleats for various predictors of graduate gradé-point 
* average (GPA) tend to be somewhat lower than corresponding coef 
— ~ fittentarat thes unidergradaate level This isnot surprising”cot 
* the restricted. range of taleit frequently encountered at the graduate 
level {aee, for example, Dawes, 1975}. - 
© The undergraduate GPA is a moderately good predictor of graduate 
GPA and faculty ratings; it is a poor predictor of whethér a etudent 
. attain the Ph.D ding upén the success criterion used, the G! 
. * composite of Ve and Quantitative Ability scores is either slightly or 
substantially mére valid than the undergraduate GPA. i 
© The GRE Advanced Test ia the most generally velid predictor among 
those reviewed. It was typically more valid than the GRE Aptitude Test 
* and had a higher validity than the undergraduate GPA in eight of the 
. ‘ine academic fields represented in the review. 
+ Recommendations are a fairly poor predictor of whether a student wil 
+ successfully complete a doctoral program. 
© A weighted composite including undergraduate GPA and one or more- 
GRE scores typically provided a validity coefficiadt in the .40 to 45 
range. This was somewhat higher than the validity of GRE scores alone 
and substantially higher than the validity of undergraduate GPA alone. 
*, This was the case for each success criterion and practically every aca- 
* demic discipline represented 


In: addition to these empirical results, a variety*of methodological and 
. » conceptual problems were cited that tend to create unusual difficulty in 
- demonstrating the validity of entrance examinations-at the graduate level. 
On the bagis of these problems, the data available, and other considerations, 
Willingham concluded that (1) the efficlencf of prediction is not likely to be 
enhanced merely through the development of improved predictors, and.(2) 
the main hope for improved effectiveness in predicting succese in graduate 
‘education lies in better definitions of what constitrites success, i.¢,, more. 
reliable criteria that are more ms clearly differentiated wah a abel training” 

: objectives, K 


: “Considering the number of graduate programs in hooey’ andf the far 
reaching importance of their admissions policies and penelate, fewrevaies 


oe 


have been made of the validity of the GRE for selecting graduate studenta. 
‘That fact and the fact that validity should always be a prime responsibility 
in any testing program ‘are sufficient reasons for emphasizing research in 
‘Ythis area. Furthermore, sevéral current circumstances make validity a 
special concern of the GRE program. These circumstantes follow three gen: 
‘eral themes. 
First, the selection of ‘greduste students is of greater concern now than i in 
, the-past for the simple reason that many more students are involved. Selec- 
tion often cannot be handiéd on personal basis and, at the same time, the 
, process is fragmented (typically along departmental lines) 90’ that the 
" statistical technology of selection frequently cannot be applied effectively. 
. Concurrently, other trends are causing faculties to question the adequacy of 
selection practices. Undergraduate, grades are assumed to be inflated and 
less trustwortby-than in the past. New regulations to protect individual 
privacy give further reason to doubt the usefulness of personal recommenda- 
tions. These developments suggest to some that GRE Verbal and Quantita- 
tive. Ability ecores should perhaps have greater weight in selection. At the 
. Same time there is increasing interest in the assessment of “competence” as 
é opposed to aptitude. For example, it is now argued by some that selection in 
higher education should place more emphasis on traits that, come closer to 
the real requirements of professional work (Hodgkinson, 1976), In support 
of this view, the modest relationship between college grades ahd adult 
success is frequently cited (Hoyt, 1965). All these developments and con: 
‘y siderations contribute uncertainty as'to what constitutes a valid basis for 
selecting students. 

Secohd, these educational and methodological concerns are confounded 
by social and legal issues that have gained great importance in the last few 
years. To a considerable extent there is de facto acceptance of an egalitarian _ 
philosophy of admission in many institutions,atthe undergraduate level~at _ 
_ least in the public sector. in large part, admission to graduate study is still 
" based upon merit, but this general rule is sRarply conditibned by the widely 
perceived necessity to represent fairly those groups that constitute minor- 
ities in graduate education. This necessity raisés complex questions con’ 
ceming what constitutes unbiased selection when prediction is, as alwaye, 
imperfect. The social issue becomes an important legal issue when the courts . 
are asked to decide what constitutes a valid test and Whether'an institution 
must always select the student with the highest probability of succesa."Iron- 
ically, a decision eithér way is likely to raise questions of implementation 
that wili require far greater sophistication concerning the validity of admis-. 
* gion practices than presently exists. Whatever the resolution, when admis- 
sion to privilege is treated as a legal issue, those respondible for the process 
must be able to defend its equity. 

The third reason validity’ is currently- such an important issue for the 
GRE program is that the Board is sponsoring a systematic research and 
development effort toward program renewal, i.e., shortening the Aptitude 
Test, developing additiona} modules for optional use, _ examining mys i. 


sh ror pre itl ated to institutions, Each of - 

il require careful attention to the validity of proposed pro- 

Not only must presently valid tests and procedures be main- 

soundness of any conceptions regarding valid measured end. 
procedures in graduate sdmistion riust be demonstrated, i 


+ Often the. ‘term validity bs concasfta eee as seats 
between a test score and soni’ measure of success in a eae 
In considering what sorts of research on velidity the GRE Board might want, 
to undertake, it is nevessary to'take into account not ‘only several conven: 
tional conceptiotis of validity, but also the fact that the Program has. various 
-» parta and varidus social implications. 


Conventional interpretations of validity _ . 
2 ( ; ; eo a 
‘The most common forms of validity are generally referred to as content 
- Validity, criterion-related validity, and construct validity. ‘Phe definitions 
quoted in the following paragraphs taken from Standards for Educa 
tional and Psychological Tests (Asherican Peychological Association, 2914), 
‘The emphasis hetow has been added. 
validity id required when the tate ser Y wishes to 
‘dual performs in the'universe of situations the test is 
intended td represent. Content validity is most commonly, evaluated for 
~ tests of skill or knowledges it may also be appropriate to inquire into the 
content validity of personality i inventories, behavior checklists, 6r measures - 
of-various aptitudes.” Thus, content validity has special relevance to the 
Advanced Tests ince these examinations must represent subject fields ac- 
curately ang produce appraisals of knowledge that are fair regardless of the 
fact ‘that undergraduate curriculums Vary frora institution {o institution, 
“Criterian-related velidities apply when one wishes to infer from‘a test 
score an individual's most probable standing on some other variable called 4, 
criterion. Statement of predictive validity {for example] indicate the extent - 
to whicti an individual's future level on the criterion can be predicted from a 
knowledge of prior test performance. . , . For many: test uses, such as for 
selection decisions,. . . predictive validity provides the appropriate model 
, _ for evaluating the pie of a test or test battery.” Predictive validity is central 
. tathe GRE program’ not only because the examinations are used to select 
» students likely to succeed in graduat study, but also becative there is 
ceasing socil snd legal preamre agelnst using testa for euch parjqeen’ 
“unless there is clear public evidence of such a relationship; 

“Evidence of construct validity is not found in a single stady;, rather,’ 
judgments of construct validity are based upon an accumulétion oftesearch 
resulta. In obtaining theinformstion needed to establish lish construct validity, « 

 . the investigator begins by formulating hypotheses about the 
* of those who have high scores on the test in contrast to thse who have low 
_ scores, Taken together, sich hypotheses form at least a; {tentative theory 


-* considerable part, the construct Sai of the GRE rests upon idea of 
_ Paychometzic research, indicating that verbal and quantitative ability play 
. actitical role in most types of intellectial work, and upon even more exten- 
sive educational experience which indicates that frequently the best_pre:, 
Aictor of futuire success in an academic field is early pompetence indicated by 
a subject -Taatter test. ‘Construct velidation- requires constant attention, 
_however, to insure that, a test is actually measuring the construct intended. 
For example, it is necessary to insure that a reading comprehension test is 
not so complicated in content as to stress reasoning instead of reading, or © 
that a mathematics test does not.use language that placts a premium upon 
“knowledge of vocabulary. Naturally, construct validation is even more 

° , demanding and important in the case of new measures in areaalike cognitive, 
* style and, creativity. ‘ 


* 2 
Social interpretations of valldlty 
. i 


2 a P . : 
A number of broad interpretations of validity are associated with the social 
implications of test use. For example, there are such questions as the validity 
ofa test for different groups of people, teat validity as reflected in the ways 
test uge affects the users, and longer range effects of using a particular test _ 

ina larger social context. These are more récent interpretations of validity. 
They deserve Special consideration bé€siuse the GRE program Specatea finan 

unusuelly broad soclal context. 
First, a valid test must be fair and appropriate for all individuald taking 
the test. That is, it must be free of systematic bias and. distortion vis-awia 
-saaie_ the various populations or subgroups taking the test, and the test should 
‘ not Have different, neanings for such groups. Messick (1976) makes the ir- 

, portantzpoint thet one validates not a test, but an interpretation of data 
derived from @ specific procedure. Whether that procedure (testt has the 

» game properties and patterns of relationghips in different population groups 

is an important empirical question. 
-From-e-somewhat different angle, Thorndike and Hagen (1969} refer gen-. 
baie to Validity as “whether the test mneasures.what we want-it to mea- 
."” Thug, while’a test may be intended to measure knowledge of Ameri- 

. can history, there are a number of things it is no? intended'to measure; 6.g. 
reading speed, cultural disadvantage, sex, age, language spoken in the, 
home, and so on. In this sense there are an indefinite number of ways in 

+h a test may be biased and an indefinite number f subgroups'for which 

t may not be appropriate. There is no way to guard against ail such pos-, 
: siiltes, and it can easily happen that making a test fairer for one person 

‘may make it less fair for another. It is evidgnt, however, that bias and ap- 
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Sei ce ee . 
a procedure as whole” (Cronbach, 


ang constanty changing conditions o , 
t Kesiderix the, test itself, but depends upon appt opriate outcomes ° 
the: tesb. The GRE Board cannot cape for 
bhadeaeens instance of test misuse, 
coatga thé social and educational implications of different 
if es ee ane inet ee te 


20) d alone; It competes for time and attention with other patts ae 
"hp examination program that may be equally valid for the same purpose or\ 
more valid for other" purposes. This lends toa third social interpretation af. 


7 Cronbach aan distinguishes~“educational importance’ es a form of. : 
validity’ equal in stature, and parallel, to content validity ‘and construct ° 
Alidity, He defines this form of vali as follows: “Does tha test measure, 


‘ rari tos are imtended as eae do to some extent camtaunicate 


colleges what the graduate community thinks colleges should tench and 
” what.students should learn. It is also argued that the GRE program needs | 
toraflect the important learning outcomes of undergraduate education; i.e.,) 
itmust fol low, the curriculum instead of leading it.-From either point of view, 
«the content of the’ program constitutes message that bas 2 bearing on 
“education far Beyond-the admigsions process. Consequently the iikerent 
, Telavance, significance, and value, of the traits measured desteve close. 
attention, 


‘Theee latter interpretations of validity bear eqpecially upon theusofuinees 
and appropriateness of different compgnents of the examitiation program, 

ath as they serve the immediate pi for which they are intended and _ 

they may be justitied in some broader educational sense. Obviously such . 


“fntérpretations do not apply to. isolated measures, but to the program as a 


. These considerations and the foregoing discussion sugigest a broader . 


Construct valldity af the program 


Fidadloeconetalag vliy on be apeled to tots gud'other invita.” 
Thaisures, tobatteriee or groups of measures, or to an integrated program 


gat-may have a variety of pieces (o.g.,.a cantral core, a variety of test , - 


“ options, piographical faformation, special measures, and #0 on). Each of | 
those pléces. should have a rationale concerning its legitimate and useful: 
‘functions’ ‘But the pieces are not free-standing. The various components are 
used in a context of related procedures, materials, services, and so on. Each, 
part of the program is to some extent dependent upon other parts and upon 
an overall rationale as to how the.program, serves ‘ta functions and: its 
constituents. - ° . 
‘These contidersiions siggeet that the Board should be guided’by an over: 
“urching sense of the construct validity of the program. In this context, a 
‘valid test is a defensible test; i.e., an accurate and fair measure of what you 
. ‘Want to measure and also one that is useful for its purpose. Moreopergcally: 
___ the notion of the construct. validity of a program miggests that a teat 
: sure is a valid component of a program if it meets-these conditfots: 


a It represents fairly what is intended. That is, it satisfies conegins such 

_ 48 content validity, construct validity, educational importance, and+ 
appropriateness for the examinees, both as one groupand as subgroups. 
“ 2 Ito use isdemonstrably effective, Tt meets the requirements of criterion 
related validity, predictive bias, Characteristics of the program that 

affect test use, ‘and legal issues eqnoemning test use. 

8, It serves a distinctive-purpoge in relation to other teste and measures in 
- the program, that is a purpose not served by the other tests and mea- 


previous Asencio Tckasouedl ‘or the estan ih this set. 
tion sit ‘research objectives are suggested in ofder to provide for thé GRE 
Board Research Cotiumittee’s, consideration spécific propels f for sition. 
The objectives are 28 follows: + 


1. To ‘ancoitrage and facilitate tititutional validity studios.“ Z 
Il. To deal effectively with methodological issues concerning. validity. : 
that require the GRE program's initiative |. 
* IIL..To davelop improved: qiteria c of success in graduate study - : 
‘ “lv. Population validity: How ts improve it and enhance cndertaning . 
of it gers 
V. To i improve instituonal useof Sumnigry program data * 4 
VI.To systematically ingure the validity of revised or-new messures re * 
“malting from program senewal | 7 . 


In the following paragraphs an‘ntialstatemeit of wach jective is fal 
“lowed by a brief rationdle and digeugéidn of several issues relevant to the r 
_ Objective. ‘These issues aré. discussed either ag general research needs or, in che 
some cases, as more specific. possible: projécts. But the mhin ‘purpose is to 
suggest a framework for-thinking bout Selidiey research that is needed. 


: Objective 1: To encourage and taptiats ttn 
sae validity etudfes, Baths , . 
* The American Paychological Awposatlon 157 outlines a yasioty of Fespon- 
sibilities of test sponsora for exemining end ésteblishing the validity of mea- - 
sures they offer for use. In this paper we give special abtention to these 
4 4 ‘responsibilities of the GRE program, but this conditions under -whittr-teats 
may be valid or invalid are essentially unlimited because applications vary 
4, sowidely With respect to:purpose, academi¢field, criteria, local cdnditions,- 
* iad 0 on. The GRE Board cannot hppe to establish validity in even a 
e significant minority’of the possible situations in which the testsmay be used, 

. Consequently, it is important for userd to their own responsi-, 
bility for examining the validity of a test for £9 purpose and circumstances 
they have in mind. As Cronbach (1971) states, ‘In the-eutd, the responai- 
bility for valid use of ¢ test rests on the petgon who jyiterprets it. The pub- 
lished research merely provides the interpretef with some facte and@oncepta, 

He has to combine these with his other knowledge about the persons he tests 
,and the gesignments or gdjustment-problems that confront them, to decide 
“what {oterpretations.are warranted,” But disers confront. Inany problems in ~~ 
 catrying out institutional validity studies. {n 1 splat B ab the appropriate 
locale ia the individual departmert, where, “howevér; the number of students 

* may be abl and the faculty may lack sufi eee to” 


the interest.and possibility of carrying out a an 
a ps se ts With techtifcal advice from staff at >. 
Seven {ETS), the institution slight organize and 
in Prindeton might auialyze tho ‘data *- 
Sssongentek cap mlaghds 5 fonhat. This process might 
A annoyneement of priority areas of interest; 
‘acaderic flds, interesting posible foreriterian develop. .” 
satay ne dorcel ger gerbe det Ne 


consist of « step-by-step notebook for ‘doing’ local studige, useful ‘ 
“Felerences and forms, 2 collection of relevant rep! , and 80 on, ‘This model 2 
‘for encouraging local studies has the virtue bf ffectivencés, but it alsa . 
places: of the responsibilities for initiative on’the institution. ETS's 
; facta phir y beg ‘That itself 4 pet. 
sel ora Mans eek 0 i 8 to bo well gpd cope ty 


«Another research ‘somewhat related to those eboveis the desirahility 

of developing offective.relationships with institutions in ordex bo facilitate * 
. -Work in this area. On the cise hand, both of the abovespossibilities cali 

i _ facilitated by working intensively with one or two institutions oyer 
 ‘geasonable period of time to explore theproblems of conducting institutional 

validity studies at the graduate level. Furthermore, there will likely be 

“need to-devélup-a cooperative relations! with a variety of institutions in 

order to validate experimental modules that may be considered for ane : 

in the'GRE program pver the next several years, This needis 

VI, though the development of the necessary, insti volte 


J pttecives 4 with mtg 


icerning val ty-that req julre: 
ihe = fdas 


+ discussion as Objective III below: The most familiar technical issues con- 
cern very small samples, often severe restriction in the range of talent dueto 
selection, and lack af confidence in the meaning and reliability of undergrad- 
uate grades.’ The following paragraphs outline some of these issues and 
suggest some possibly fruitful lines of research. 

Validity studies tarried out in individual departrients are often based 
upon eniall samples and a very restricted range of test scores and under- 
graduate grades. These conditions often combine to produce low and erratic 
yalidity coefficients. A related but different view of this problem is the fact 
that there has been very little attention given fo the validity of the GRE 
among departments within individual disciplines or fislds. It would appear 
desirable to give additional attention to ways of pooling data across depart. 

. ments, thereby mitigating the technical problems and also demonstrating 

* validity in a larger context. This requires use of some common criterion. 
Perhaps the only one that would thake sense is somegeneral notion of success’, 
in graduate education, such as completion of the degreeor overall faculty 
ratings. , 

Another general possibility for dealing with these issues is a seicapetlive 
nomination study; i.e., asking faculty in a number of departments within a 
field to’ nominate outstanding and poor students over a period of ‘several 
years. It may be possible to develop substantial samples for studies‘within 
selected fields. This type of study would also reqiire a common criterion of 
success, such as obtaining the Ph.D. 

‘A special advantage of these two types of studies isthe fact that they can 
be carried out over a limited time period. But, it addition to narrdwly con 
ceived validity studies, many especially interesting researcli questions re, 
quire longitudinal study. Strious consideratjon ehould -be given to the 
development of a longitudinal study that follows a carefully structured 
_sample of'students through and beyond graduate education. Students from _ 

“several fields‘might be included with.oyersampling of special groups of 
interest. With several spaced follow-ups, a group of cooperating students 5 
could provide a valuable data base for a variety of studies in‘addition to spe- 
cific inveatigations designed at the outset. Topics of special interest in such 

*.a longitudinal study would include follow-up of minority students through 

andbeyénd graduate education, studies of patterns of attendance and carer * 
choice, and analysis of the cost of graduate education and- baw. financing’ 
alternatives affect students’ decisions. 

The quality of undergraduate grades a8 a predictor i is another § 
problem. It is commonly understood and acceptedthat a ‘B" at one ilfstitu- 
tiop is not necessarily equivalent to a “B” at another institution. ‘A gpod 
deal of research at the undergraduate level bas indicated that there is no 
value in trying to adjust grades from different high schools-if admissions 
decisions are made on the basis of grades and’an entrance examination. At 
the graduate level, however, it seems that corrections for variations in grad- 
ing standards from one undergraduate college to another can sometimes im- 
prove predictions slightly. Pitcher and Schrader (1972) ld that multiple 
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‘esult has been found in the case of law school 


ar severely inflated and the fact that some faculty are ‘ 
to. competitive grading as a matter of principle. It is difficult to 
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they forecast success in graduate study could provide a valuable service by 
; balping to infvence evolvitg sontraditional practices ia sound rections. 
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suspect; * 
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ds upon many ititangible 


Tha nooo gfe podfctzention vo tho rationale Sad meer 
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asymmetric cic lithe task ip nisin ob 5 dds patience 
has bola discarded in fevor of 8 ayidbnetic view. According to this, petsona 
= obsérved im situations, Some are: ‘artificial occasions for observations, 
Wligh we cai Sos: and gome age sitaations arising in the natyral course 
_ of thé peraon’a work or schooling, Relating these observations to each other’ 
ale one about the situational defaniis and about the resources individuals 
i bring to bear, To.study the validity of a test-nterpretation isto study how 
"behavior in oife situation is Telated to behavior in another. Both observa- 
, tions reveal, characteristics vf the individual, and both types of behayior 
should be understood." + zg 
. Defefistble and reliable criteria of success are likely to become more im. * 
portant in the facg of lagging confidence in grades and the increasing-need to 
justify administrative actions, both with respect to admitting and dropping 
student3,, Furthermofe, admission standards are likely to come under in- 
‘  ereasing scrutiny, partly by those speaking for underrepresented groups }\ 
» who queatiqn the social equity of current practices and partly by those who 
assert théra is indue reliance upon aptitude teste and objective meesures in 
_ genertl, Shoyld legal action require empirical justification of admission deci- 
sidns, the need to develop sound criteria will immediately become critical. 
Willingharn (1974) has urged much greater attention to the problem of 
criteria; especially in the context of a broader view of predictor-criterion 
relationships and alternate strategies of selection. The adtemate strategies 
depicted in.Figure 1 imply that different departinents:or programs within’ 
departments mey emphasize different training objectives, which in turn’ 
“should be related to the way students are selected and the way: hele pe 
formatice” is eratnated. ; j 
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Figure 1! Alternate prediction strategies in graduate education Refroduced 
Le, with permission from Science, 1974; 188, (4122), 277. Co) ight t 
1974 by théAmerican. Association for the Advancement ot $ 
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success in achoo! and also should have a 
Y range criteria. That is, the test should have con~ 
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‘There ard several strategies that might be helpfal in developing tinproved. 
It seems especially important to encourage systematically the de-. 
‘velopment f better critetia in the routine execution of validity studies as 
routine evaluation of students’ performance, This might be ac- 
partly by working with individual institutions to carry out and 
‘pul lish model studies that can help to illustrate the development of ditfer- 
rof criteria. For example, it would be desirable to illustrate and en 
courage the ue of reliable criteria like rating scales, such as those devel 
‘through GRE research (Carlson, Reilly, Mahoney, and Casserly, 1976), or 
prehensive examina tions, anyone dha om 


given additional legal interest in graduate admissions, it may be desirable to" 
lettake.a fairly systematic analysis of how graduate departments view ‘* 
}, A useful follow-up to the Carlson, Evans, and Kuykendall (1974) 
sutvey would'be an intensive analysis of the rationale and basis upon which 
evaluate.their students, the evaluation “procedures actuality” 
ployed; and the p&ychometric properties of the resulting criteria, This’. ” 
> Bort ‘of analysis could have considerable value in describing how high level 
talent is civrently assessed and in suggesting opportunities and ad possible 
road blocks in the development df improved: criteria. 
her generalll desirable strategy would be to foster the development 
of auch internipdiate criteria as depicted in Figure 1. This might involve 
facul tings of particular types of accomplishments, special means of col-- 


¢ is relevant to the most important training objectives, 
iy-of couraging the development of such criteria is through the 
thy titutional studies described under Objective I. Another 
‘ special developmental project which may be required In the 


oe eee. aie : eA p . 
case of an unusually complex criterion such a8 scientific creativity. The 
present GRE research on scientific creativity is concerned specifically with 
the development of ah intermediate criterion. 

Another potentially useful approach starte With the observation that 
practically all validity“stydies incorporate criteria based upon student per 
formance in graduate school. This typical design has two shortcomings if 

‘one is interested in confirming the “ultimate” social xélevance (i.e., Con; - 
atruct validity) of the GRE. First, typical validity studies are not directly 


relevant to the question of whether the GRE are effective in selecting for . 


graduate study people who are likely to reach the highest fevels of profes- 
sional success.’ While screening prospective professionals from a group of 
graduate students is primarily the responsibility of the graduate schools, — 
_ tests used for earlier screening abocid not be counterproductive in that’ 
process; that is, one would like to know thet very successful professionals 
have typleally scored well so that screening out students with low scores is 
both efficient and defensible. Second, studies that use success criteria rela- 
tive to the standards of individual institutions may seriously underestimate 
the usefulness of the examinations because the range of talent is typically 
restricted at individual institutions, but ranges widely from one institution 
“to another, Thus, the GRE may be relatively poor predictors 6f graduate 
fopmance in a single prestigious history department, but may provide a 
reasonably’ “Rood indication of differential competence among graduates of: 
ell history departments. 

Tt might be worthwhile, therefore, to examine the feasibility of determin. 
ing Aptitude Test score levels for pertinent groupe of individuals who have 
“achieved some formal measure of success in their field. These t include 
stich ad hoc groups as fellows of learned societies, officers of professional 
organizations, faculty of prestigious departments, individuals listed in 
honorific hiographies, and so on. Comparison of the scores of such indi. 
viduals with appropriate normative groups would be interesting, even ad- 
mitting-the possibility that some individuals niay achieve prestigious status 
partly because at one time they were inown to heve scored highly on the 
test, or in spite of heving scored poorly. 
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, Objective iV: Population validity: How to improve - : 
- ltand enhance understanding of It 7 


. Messick and Barrows (1972) used the térm population validity in referring 
to the generalizability of research findings across different populations. A 
similar notion applies to the validity of tests and other psychometric mea-“~ 
sures. If a test leads to incorrect inferences about a particular population, 
then the test is to that extent invalid. Incorrect inferences may result from 
the fact that the test itself is not a good meamure for that population or that 
the test does not have the same relationship with the criterion for that popu- 
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Objective M: To iimprove institutional useof summary 
: Program deta . . 


‘eal fobs Gimsabeof only to rlation to tho indivtdgale who tale 


teat. . But test performance data are often reported about groups of exam. : 


inees and ‘inferences are drawn concerning those groups and the educational 
, Programs in-which they have taken pert. As the GRE program seeks, to.” 
serve better the interests and needs of institutional sponsors, such summary: . 

ta should be reported miore frequently and systematically, ‘The prospect of 
more systematic reporting of program data intensifies the need to insure 
“that auch. data verve the useful purposes Peper dd poser 


74 process to serve their institutional { 
pagel pvedinppo degtaenrimpene ager) of as 8 
hing pool of individuals; viz., beget a aap oh a 


- degree. 

- acteristicg of theee successively diminishing groupe, Gascawa” 
. Bergonal and demographic characteristics that operate most strongly in the _ 
Sear an tl cmt en — 


: gram to the Undergraduate Aseessment Program at ETS, tn OE 


thks ‘pew service is to develop additional subscores for-the GRE Advanced. fi 


‘Tests and report them in summary form for groups of studenta in imnder- 
graduate departments. This sort, of reporting cai¥ be ¢ te useful to depart. 
ments if it emphasizes the comparison of department objectives with stu- 
__ dent performanceon corresponding parts of the examinations. The design:of 

terpreti camattaiaied this sort gould bean important contribution of |. 


te . 
the GRE Board ‘ave given. 
at program renewal over “gare 
‘years, Naturally this effort is experimental and highly, pata 
vé, bu i dns blade € niscber af peoeisht chase or additions to thes, 
picteenssening tees a possible shortening of the Aptitude-Test and 


purposes than'selective admissions, and pond ty 
for insti ations and students. 


change should not be taken lightly. It sho 
. reasongble essurance pace fepenpetadonrl have phere bn 
validity as they were intended. Such*assurancé-should be a conscidus objec: 
tive of the Board, Specific necessities for validation will arige'as nswepro- ¢ 
: Jgram,components are developed. Possibilities already appatent: ‘incl the 
followings 7 tg 
'aminaticn of the factorial sind prodicdive validity 7 of-new versus old) * 
__ forms of ti verbal disd. quantitative Aptitude Test.” k, 2 
© devélopment of a normative framework for the interpretation and Be 
_. sible operational use of a measure of cognitive style; : 


mediate criteria of scientific’ ‘Ereativity; 
. — of an inventory, of abit ae 


pe pac dc ‘They aré, however, indicative of the oie diverse * 


* ” jasues that require attention ifthe GRE, Board is to feel reasonably confident ~ 


its examination program does exhibit construct velidity in all essential, . 
respects. In that spirit the foregoing thould be viewed asa Possibly, useful 
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jework, and aniation, not a prescription. 
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‘The following “tiie lists under.each objective: tha relevant current ne 
a ‘research as well as projects conducted and reported upon in tha past. Assug- 
++, gested by the previous discussion, However, thers is neéd for further re . 
Tac oat Some lias already been proposed to tha GRE Board 
- Committee, aia ai haat fae 
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EPrediting atnsep sanoot ‘success (Lannhiolm dnd Schrader, 1951) 


*Phedicting success in Yale School of Forestry (Olsen, 1965) 
© Abstracts of Selected validity studies (Lannhbtm, 1960) “.- 
“Review of validity studies (Lannholm, 19688)“ B 
* © Cooperative validity studies (Lannholtn, Marco, Schrader, 1968) 
_ © Summary of val{dity studies (Lannholm, 1972} 
-* Predicting euccess-jn graduate education, Wilinghaw, 1978). 


Current Project ey 
° Cooperative: instittitional ‘iit. studies (Wilson) 


By ~ . 


Objective Il: To deal effectively with methodological 
Issues concerning validity that require A 
the GRE program's inillative’ ae 


7 Reports a ‘ - 
B28 id. .©’The'tést chooser (Rock, 1974) * 
‘T sk, @. Effects of option weighting on vaildity (Reilly and Jackson, yom) 
* © Population moderatore (Rock, 1975} 
© Bayesian-anll least équares prediction (Boldt, 1975) 
© Prediction of Ph.D. attainment in psychology, mathematics, and chem 
dotry, Rock, rena 
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* Reports /’. 
© A study of the. ‘Advanced sy Test (Campbell, Pend, and Lann- 


-holm, 1966) . 
Critical incidents of graduate performarice (Reilly, 19740) 


*e as in graduaté Perma t (Reilly, 19746) 


Criterion rating jealaa (Cartoon, Reilly, Mahoney, and Cannery, 1918) ¥ 
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Reports ng ndires “ 
© Prediction of graduate ate of, erin students (Hetvey aod. 
Pitcher, 1963), : 
© Use of JOEFL and GRE it predicting success ‘of forelgn students 
‘Sharon, 1974) 
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Current Projects . * 
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© Cognitive style longitidinal study (Witken and Ward) ; 
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